Morgan Brand @Morgs_John + PhD student UCT
#WorldAqua17
Robert Schlegel @wiederweiter + PhD student UWC
29 June 2017 Room 1.43
Why?
The relevant not so distant past
For working - R
For writing - Latex
For tracking - Git
The present and the future
For working - RStudio
For wrangling - tidyverse
For writing - R Markdown + Bookdown + Thesisdown + Blogdown
For collaborating - Github
'modern' scientists?
Roles
- Communication
- Popular articles
- Public speaking
- Interdisciplinary work
- Collaboration
- Beyond your lab
A tradional approach to the scientific method
Devise a fancy question and call it a hypothesis
Formulate a means of collecting the relevant data
Import data set into statistical software package
Run the procedure to get results
Copy and paste appropriate pieces from the analysis into document editor
Add descriptions
Finish/submit report for comments
REPEAT steps 2 - 7 after receiving comments indefinately..
Disadvantages ot this process
The process of data capture is not open
Lots of manual work (prone to make errors)
Tedious (who likes to carefully copy-and-paste?)
Likely not recordable (did you write down all the steps you followed to get your analysis?)
What if you made an error at the beginning of your analysis? If your data had an error? If your hypothesis was biased?
Why R?
R is a free software package for statistical analysis and graphics.
- It excels in helping you with:
- data manipulation
- automation
- reproducibility
- improved accuracy
- error finding
- customizability
- beautiful visualizations
- Any downsides?
R, R console and RStudio
R (programming language) is a programming language and environment, "made by statistician and for statistician"
R console is an older version that favours the command line programmer
RStudio is an Integrated Development Environment (IDE) that helps you develop programs in R
You can use R without using RStudio, but you can't use RStudio without using R
Tidyverse in R
Tidying is the act of converting “messy” into “tidy” data frames
Tidyverse in R
The tidyverse is a set of packages that work in harmony
The core tidyverse packages are:
- ggplot2, for data visualisation.
- dplyr, for data manipulation.
- tidyr, for data tidying.
- readr, for data import.
- purrr, for functional programming.
- tibble, for tibbles, a modern re-imagining of data frames.
It also installs a selection of other tidyverse packages
R Markdown?
R Markdown?
“Literate programming”
Embed R code in a Markdown document
Renders textual output along with graphics
You can write your entire paper/report (text, code, analysis, graphics, etc.) all in R Markdown.
R Markdown?
Bookdown with R Markdown
Bookdown with R Markdown
Bookdown is one of the more recent additions to the R-universe.
Some highlights are:
Multiple output formats
Focus on writing the content not typesetting
Readers can interact with examples
Feedback and contributions as the book is developed
Integrates with version control
Thesisdown with R Markdown
Thesisdown is built from Bookdown
The current output for the four versions is here:
- Word
- ePub
- HTML and Gitbook
Thesisdown with R Markdown - Files
Thesisdown with R Markdown - PDF
Thesisdown with R Markdown - YAML
Blogdown with R Markdown
You can now increase your on-line voice using tools developed in your research methods and present them as a blog!
- The R package Blogdown allows you to create websites using R Markdown
The website is generated from R Markdown documents
- all your results
- analysis
- graphics
can be computed and rendered dynamically from R code to your website!
Blogdown with R Markdown Yihui Xie
Blogdown with R Markdown Amber Thomas
Blogdown with R Markdown Amber Thomas
Git?
Git and Github
Git is a version control system that lets you track changes to files over time
- Git manages the evolution of a set of files – called a repository
Github is a website for storing your git versioned files remotely
Github provides a home for your Git-based projects on the internet
If you are a student you can get the micro account which includes 5 private repositories for free!
Github
Reproducible Research
“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.”
Donald Knuth, Literate Programming (1984)
Reproducible Research
“Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them.”
Roger Peng, Johns Hopkins
Session info
sessionInfo()
## R version 3.3.0 (2016-05-03) ## Platform: x86_64-apple-darwin13.4.0 (64-bit) ## Running under: OS X 10.11.6 (El Capitan) ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## loaded via a namespace (and not attached): ## [1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 ## [4] tools_3.3.0 htmltools_0.3.6 yaml_2.1.14 ## [7] Rcpp_0.12.11 stringi_1.1.5 rmarkdown_1.6.0.9000 ## [10] knitr_1.16.5 stringr_1.2.0 digest_0.6.12 ## [13] evaluate_0.10
Thanks and references
I would like to Thank @Old_Man_Chester and his RPubs slides
Authoring Books with R Markdown
Happy Git and GitHub for the useR
Our path to better science in less time using open data science tools